Demonstration of the CROSSMARC System
نویسندگان
چکیده
Vangelis Karkaletsis , Constantine D. Spyropoulos , Dimitris Souflis , Claire Grover , Ben Hachey , Maria Teresa Pazienza , Michele Vindigni , Emmanuel Cartier , José Coch Institute for Informatics and Telecommunications, NCSR “Demokritos” vangelis, costass @iit.demokritos.gr Velti S.A. [email protected] Division of Informatics, University of Edinburgh grover, bhachey @ed.ac.uk D.I.S.P., Universita di Roma Tor Vergata pazienza, vindigni @info.uniroma2.it Lingway emmanuel.cartier, Jose.Coch @lingway.com
منابع مشابه
Information Retrieval and Extraction from the Web: the CROSSMARC approach
The paper presents the CROSSMARC approach for the complex task of identification of interesting web sites and web pages and the extraction of information from them. This task is hard because most of the information on the Web today is in the form of HTML documents, which are designed for presentation purposes and not for automatic extraction systems. This task becomes even harder in a multiling...
متن کاملNamed Entity Recognition in Greek Web Pages
We describe the functionalities of the Hellenic Named Entity Recognition and Classification (HNERC) system developed in the context of the CROSSMARC project. CROSSMARC is developing technology for e-retail product comparison. The CROSSMARC system locates relevant retailers’ web pages and processes them in order to extract information about their products (e.g. technical features, prices). CROSS...
متن کاملDomain-Specific Web Site Identification: The CROSSMARC Focused Web Crawler
This paper presents techniques for identifying domain specific web sites that have been implemented as part of the EC-funded R&D project, CROSSMARC. The project aims to develop technology for extracting interesting information from domain-specific web pages. It is therefore important for CROSSMARC to identify web sites in which interesting domain specific pages reside (focused web crawling). Th...
متن کاملUse of Ontologies for Cross-lingual Information Management in the Web
We present the ontology-based approach for crosslingual information management of web content that has been developed by the EC-funded project CROSSMARC. CROSSMARC can be perceived as a meta-search engine, which identifies domainspecific information from the Web. To achieve this, it employs agents for web crawling, spidering, information extraction from web pages, data storage, and data present...
متن کاملCross-lingual Information Extraction from Web pages: the use of a general-purpose Text Engineering Platform
In this paper we present how the use of a general-purpose text engineering platform has facilitated the development of a cross-lingual information extraction system and its adaptation to new domains and languages. Our approach for crosslingual information extraction from the Web covers all the way from the identification of Web sites of interest, to the location of the domainspecific Web pages,...
متن کامل